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Abstract — The problem of random number generation from 
an uncorrelated random source (of unknown probability distri- 
bution) dates back to von Neumann's 1951 work. Elias (1972) 
generalized von Neumann's scheme and showed how to achieve 
optimal efficiency in unbiased random bits generation. Hence, 
a natural question is what if the sources are correlated? Both 
Elias and Samuelson proposed methods for generating unbiased 
random bits in the case of correlated sources (of unknown prob- 
ability distribution), specifically, they considered finite Markov 
chains. However, their proposed methods are not efficient or have 
implementation difficulties. Blum (1986) devised an algorithm for 
efficiently generating random bits from degree-2 finite Markov 
chains in expected linear time, however, his beautiful method is 
still far from optimality on information-efficiency. In this paper, 
we generalize Blum's algorithm to arbitrary degree finite Markov 
chains and combine it with Elias 's method for efficient generation 
of unbiased bits. As a result, we provide the first known algorithm 
that generates unbiased random bits from an arbitrary finite 
Markov chain, operates in expected linear time and achieves the 
information-theoretic upper bound on efficiency. 

Index Terms — Random sequence, Random bits generation, 
Markov chain. 



I. Introduction 

The problem of random number generation dates back to 
von Neumann JS) who considered the problem of simulating 
an unbiased coin by using a biased coin with unknown 
probabihty. He observed that when one focuses on a pair of 
coin tosses, the events HT and TH have the same probability 
(H is for 'head' and T is for 'tail'); hence, HT produces 
the output symbol and TH produces the output symbol 
1. The other two possible events, namely, HH and TT, are 
ignored, namely, they do not produce any output symbols. 
More efficient algorithms for generating random bits from a 
biased coin were proposed by Hoeffding and Simons 0, Elias 
0, Stout and Warren (W\ and Peres |11|. EHas f3\ was the 
first to devise an optimal procedure in terms of the information 
efficiency, namely, the expected number of unbiased random 
bits generated per coin toss is asymptotically equal to the 
entropy of the biased coin. In addition, Knuth and Yao |7| 
presented a simple procedure for generating sequences with 
arbitrary probability distributions from an unbiased coin (the 
probabihty of H and T is i). Han and Hoshi |4| generahzed 
this approach and considered the case where the given coin 
has an arbitrary known bias. 
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In this paper, we study the problem of generating random 
bits from an arbitrary and unknown finite Markov chain (the 
transition matrix is unknown). The input to our problem is 
a sequence of symbols that represent a random trajectory 
through the states of the Markov chain - given this input 
sequence our algorithm generates an independent unbiased 
binary sequence called the output sequence. This problem was 
first studied by Samuelson |13|. His approach was to focus on 
a single state (ignoring the other states) treat the transitions out 
of this state as the input process, hence, reducing the problem 
of correlated sources to the problem of a single 'independent' 
random source; obviously, this method is not efficient. Elias O 
suggested to utilize the sequences related to all states: Produc- 
ing an 'independent' output sequence from the transitions out 
of every state and then pasting (concatenating) the collection of 
output sequences to generate a long output sequence. However, 
neither Samuelson nor Elias proved that their methods work 
for arbitrary Markov chains, namely, they did not prove that 
the transitions out of each state are independent. In fact, Blum 
lH] probably realized it, as he mentioned that: (i) "Elias's 
algorithm is excellent, but certain difficulties arise in trying to 
use it (or the original von Neumann scheme) to generate bits 
in expected linear time from a Markov chain", and (ii) "Elias 
has suggested a way to use all the symbols produced by a 
MC (Markov Chain). His algorithm approaches the maximum 
possible efficiency for a one-state MC. For a multi-state MC, 
his algorithm produces arbitrarily long finite sequences. He 
does not, however, show how to paste these finite sequences 
together to produce infinitely long independent unbiased se- 
quences." Blum Q derived a beautiful algorithm to generate 
random bits from a degree-2 Markov chain in expected linear 
time by utilizing the von Neumann scheme for generating 
random bits from biased coin flips. While his approach can be 
extended to arbitrary out-degrees (the general Markov chain 
model used in this paper), the information-efficiency is still 
far from being optimal due to the low information-efficiency 
of the von Neumann scheme. 

In this paper, we generalize Blum's algorithm to arbitrary 
degree finite Markov chains and combine it with existing meth- 
ods for efficient generation of unbiased bits from biased coins, 
such as Elias's method. As a result, we provide the first known 
algorithm that generates unbiased random bits from arbitrary 
finite Markov chains, operates in expected linear time and 
achieves the information-theoretic upper bound on efficiency. 
Specifically, we propose an algorithm (that we call Algorithm 
A), that is a simple modification of Elias's suggestion to 
generate random bits, it operates on finite sequences and its 
efficiency can asymptotically reach the information-theoretic 
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upper bound for long input sequences. In addition, we propose 
a second algorithm, called Algorithm B, that is a combination 
of Blum's and Elias's algorithms, it generates infinitely long 
sequences of random bits in expected Unear time. One of our 
key ideas for generating random bits is that we explore equal- 
probability sequences of the same length. Hence, a natural 
question is: Can we improve the efficiency by utilizing as 
many as possible equal-probability sequences? We provide a 
positive answer to this question and describe Algorithm C, 
that is the first known polynomial-time and optimal algorithm 
(it is optimal in terms of information-efficiency for an arbitrary 
input length) for random bits generation from finite Markov 
chains. 

In this paper, we use the following notations: 







the a*'' element of X 


X[a 




same as Xa, the a*'' element of X 


X[a 


■■b] 


subsequence of X from the a*^ to element 


X" 




X[l : a] 


X* 


Y 


the concatenation of X and Y 
e.g. S1S2 * S2S1 — S1S2S2S1 


Y = 


X 


y is a permutation of X 
e.g. S1S2S2S3 = S3S2S2S1 


Y ^ 


X 


y is a permutation of X and y^Y\ — x^x\ 

namely the last element is fixed 

e.g. S1S2S2S3 = S2S2S1S3 where S3 is fixed 



The remainder of this paper is organized as follows. Section 
nil reviews existing schemes for generating random bits from 
arbitrarily biased coins. Section |lll] discusses the challenge in 
generating random bits from arbitrary finite Markov chains and 
presents our main lemma - this lemma characterizes the exit 
sequences of Markov chains. Algorithm A is presented and an- 
alyzed in Section HVl it is related to Elias's ideas for generating 
random bits from Markov chains. Algorithm B is presented 
in Section [V] it is a generahzation of Blum's algorithm. An 
optimal algorithm, called Algorithm C, is described in Section 
rvll Finally, Section IVlIl provides numerical evaluations of our 
algorithms. 

II. Generating Random Bits for Biased Coins 

Consider a sequence of length N generated by a biased 
n-face coin 

X = XiX2:.XN e {Sl, S2, Sn}^ 

such that the probability to get Si is pi, and J^^^iPi — 1- 
While we are given a sequence X the probabilities that 
pi,P2,...,Pn are unknown, the question is: How can we 
efficiently generate an independent and unbiased sequence of 
O's and I's from XI The efficiency (information-efficiency) 
of a generation algorithm is defined as the ratio between the 
expected length of the output sequence and the length of the 
input sequence, namely, the expected number of random bits 
generated per input symbol. In this section we describe three 
existing solutions for the problem of random bits generation 
from biased coins. 



A. The von Neumann Scheme 

In 1951, von Neumann |8| considered this question for 
biased coins and described a simple procedure for generating 
an independent unbiased binary sequence Z1Z2... from the 
input sequence X = a; 1X2.... In his original procedure, the 
coin is binary, however, it can be simply generalized for the 
case of an n-face coin: For an input sequence, we can divide 
it into pairs xiX2, x^Xi, ... and use the following mapping for 
each pair 

SiSj(i <.?■)-!> 0, SiSj{i>j)^l, SiSi^cf) 

where denotes the empty sequence. As a result, by concate- 
nating the outputs of all the pairs, we can get a binary sequence 
which is independent and unbiased. The von Neumann scheme 
is computationally (very) efficient, however, its information- 
efficiency is far from being optimal. For example, when the 
input sequence is binary, the probability for a pair of input 
bits to generate an output bit (not a 0) is 2piP2, hence the 
efficiency is piP2, which is j at pi = p2 = ^ and less 
elsewhere. 

B. The Elias Scheme 

In 1972, Elias |[3| proposed an optimal (in terms of ef- 
ficiency) algorithm as a generalization of the von Neumann 
scheme; for the sake of completeness we describe it here. 

Elias's method is based on the following idea: The possible 
input sequences of length N can be partitioned into classes 
such that all the sequences in the same class have the same 
number of Sfc's with 1 < fc < n. Note that for every class, 
the members of the class have the same probability to be 
generated. For example, let n = 2 and = 4, we can divide 
the possible — 16 input sequences into 5 classes: 

= {siSlSlSl} 

51 = {siSlSlS2, S1S1S2S1, S1S2S1S1, S2S1S1S1} 

52 = {S1S1S2S2, S1S2S1S2, S1S2S2S1, 

S2S1S1S2, S2S1S2S1, S2S2S1S1} 

53 = {siS2S2S2,S2SiS2S2,S2S2SiS2,S2S2S2Si} 
Si = {S2S2S2S2} 

Now, our goal is to assign a string of bits (the output) to each 
possible input sequence, such that any two output sequences Y 
and Y' with the same length (say k), have the same probability 
to be generated, namely |^ for some < < 1. The idea is 
that for any given class we partition the members of the class 
to groups of sizes that are a power of 2, for a group with 
2* members (for some i) we assign binary strings of length 
i. Note that when the class size is odd we have to exclude 
one member of this class. We now demonstrate the idea by 
continuing the example above. 

Note that in the example above, we cannot assign any bits to 
the sequence in Sq, so if the input sequence is siSiSiSi, the 
output sequence should be (f> (denotes the empty sequence). 
There are 4 sequences in 6*1 and we assign the binary strings 
as follows: 

S1S1S1S2 00, S1S1S2S1 — > 01 
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S1S2S1S1 10, S2S1S1S1 11 

Similarly, for ^2, there are 6 sequences that can be divided 
into a group of 4 and a group of 2: 

S1S1S2S2 00, S1S2S1S2— >oi 
S1S2S2S1 10, S2S1S1S2 -J> 11 

S2S1S2S1 -> 0, S2S2S1S1 1 

In general, for a class with W members that were not 
assigned yet, assign 2^ possible output binary sequences of 
length j to 2^ distinct unassigned members, where 2^ < W < 
2^+^. Repeat the procedure above for the rest of the members 
that were not assigned. Note that when a class has an odd 
number of members, there will be one and only one member 
assigned to (p. 

Given an input sequence X of length N, using the method 
above, the output sequence can be written as a function of X, 
denoted by ^'^(X), called the Elias function. In [|l2i . Ryabko 
and Matchikina showed that the Elias function of an input 
sequence of length N (that is generated by a biased coin with 
two faces) is computable in 0{N log"^ N log log{N)) time. We 
can prove that their conclusion is valid in the general case of 
a coin with n faces with n > 2. 

C. The Peres Scheme 

In 1992, Peres |TT1 demonstrated that iterating the orig- 
inal von Neumann scheme on the discarded information 
can asymptotically achieve optimal efficiency. Let's define 
the function related to the von Neumann scheme as 'I'l : 
{0, 1}* {0, 1}*. Then the iterated procedures '^^ with v > 
2 are defined inductively. Given input sequence xiX2---X2m, 
let ii < 12 < ... < jfc denote all the indices i < m for which 
X2i ~ X2i~i, then '^^ is defined as 

*i,(a;i,a;2,...,a;2m) 

= '^i{xi,X2, ■.■,X2m) * ^v-lixi ® X2, ...,X2rn-l © X2m) 
*'^y-l{Xi^,...,Xi^) 

Note that on the righthand side of the equation above, the 
first term corresponds to the random bits generated with the 
von Neumann scheme, the second and third terms relate to 
the symmetric information discarded by the von Neumann 
scheme. 

Finally, we can define '^y for sequences of odd length by 

'^■u{xi,X2, ■■.,X27n+l) = '^vixi,X2, ■■.,X2m) 

Surprisingly, this simple iterative procedure achieves the op- 
timal efficiency asymptotically. The computational complexity 
and memory requirements of this scheme are substantially 
smaller than those of the Elias scheme. However, a drawback 
of this scheme is that its generalization to the case of an n-face 
coin with n > 2 is not obvious. 



D. Properties of the Schemes 

Let's denote 4' : {si, S2, Sn}^ ~^ {0,1}* as a scheme 
that generates independent unbiased sequences from any bi- 
ased coins (with unknown probabilities). Such can be the 
von Neumann scheme, the Elias scheme, the Peres scheme or 
any other scheme. Let X be a sequence generated from an 
arbitrary biased coin, with length N, then a property of \E' is 
that for any Y e {0, 1}* and Y' e {0, 1}* with |r| = \Y'\, 
we have 

P[^{X) =Y]= P[^{X) ^ Y'] 

Namely, two output sequences of equal length have equal 
probability. 

That leads to the following property for ^E". It says that given 
the number of s^'s for all i with 1 < i < n, the number of 
such sequences to yield a binary sequence Y equals to that of 
sequences to yield Y' if Y and Y' have the same length. It 
further implies that given the condition of knowing the number 
of Si's for all i with 1 < i < n, the output sequence of 5* 
is still independent and unbiased. This property is due to the 
linear independence of probability functions of the sequences 
with different numbers of the s^'s. 

Lemma 1. Let S be a subset in {si, S2, Sn}^ such that it 
includes all the sequences with the same number of Si's for all 
i with 1 < i < n, namely, fei, fc2, fc„. Let By denote the set 
= Y}. Then for any Y G {0, 1}* and Y' G {0, 1}* 
with \Y\ = \Y'\, we have \S(^By\ = \S^By'\. 

Proof In S, the number of s^'s in each sequence is ki 
for all 1 < i < n, then we can get that 

n 

P[^{X) = Y]=Y^\S{^By\X{p{S) 

S 1=1 

where 

1=1 

Since = F] = = Y'], we have 

Y,i\Sf]By\-\Sf]By,\)m^O 

s 

The set of polynomials lJsi'^('^)j' hnearly independent 
in the vector space of functions on [0, 1], so we can conclude 

that ISfl^Yl = l^n^y'l- ■ 

III. Some Properties of Markov Chains 

Our goal is to efficiently generate random bits from a 
Markov chain with unknown transition probabilities. The 
paradigm we study is that a Markov chain generates the se- 
quence of states that it is visiting and this sequence of states is 
the input sequence to our algorithm for generating random bits. 
Specifically, we express an input sequence as X — xiX2-..xn 
with Xi € {si, S2, Sn}, where {si, S2, Sn} indicate the 
states of a Markov chain. 

One idea is that for a given Markov chain, we can treat 
each state, say s, as a coin and consider the 'next states' (the 
states the chain has transitioned to after being at state s) as the 
results of a coin toss. Namely, we can generate a collection 
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Input sequence 


Probability 




^(.'^li-^j) * V(7r2(A)j 


SlSlSlSl 


— 7^ 7*^ 

(1 - Pi)'' 


<t> 


</> 


S1S1S1S2 


(1 -Pl)V 








SISIS2SI 


(1 -pi)piP2 








SISIS2S2 


(1 -Pl)pi(l -P2) 








S1S2S1SI 


PlP2(l - Pi) 


1 


1 


S1S2S1S2 


P?P2 




<p 


SIS2S2SI 


Pl(l -P2)P2 




1 


S1S2S2S2 


Pl(l-P2)2 








TABLE I 

Probabilities of exit sequences - an example that simple concatenation does not work. 



of sequences Tr{X) — [tti (X), 7r2(X), 7r„(X)], called exit 
sequences, where Tri{X) is the sequence of states following 
Si in X, namely, 

TTi{X) = {Xj+l\xj ^ Si,l < j < N} 

For example, assume that the input sequence is 

X = S1S4S2S1S3S2S3S1S1S2S3S4S1 

If we consider the states following si we get tti{X) as the 
set of states in boldface: 

X = S1S4S2S1S3S2S3S1S1S2S3S4S1 

Hence, the exit sequences are: 

TTliX) = S4S3S1S2 

7r2(X) = S1S3S3 

TT3{X) = S2S1S4 

7T4{X) = S2SI 

Lemma 2 (Uniqueness). An input sequence X can be uniquely 
determined by xi and t^{X). 

Proof: Given xi and 7r(X), according to the work of 
Blum in |1|, xiX2---xn can uniquely be constructed in the 
following way: Initially, set the starting state as xi. Inductively, 
if Xi — Sfc, then set Xi+i as the first element in iikiX) and 
remove the first element of 7rfc(X). Finally, we can uniquely 
generate the sequence xiX2---Xn- ■ 

Lemma 3 (Equal-probability). Two input sequences X — 
xiX2---Xn and Y = yiy2---yN with xi = yi have the 
same probability to be generated if TTi{X) = 7Ti{Y) for all 
l<i <n. 

Proof: Note that the probability to generate X is 

P[X] = P[xi]P[x2\xi]...P[xn\xn-i] 
and the probability to generate Y is 

P[Y]=P[y^]P[y2\yi]...P[yN\yN-i] 

By permutating the terms in the expression above, it is not hard 
to get that P[X] = P[Y] if xi = yi and ^^^iX) = TniY) for 
all 1 < i < n. Basically, the exit sequences describe the edges 
that are used in the trajectory in the Markov chain. The edges 
in the trajectories that correspond to X and Y are identical, 
hence P[X] = P[Y]. ■ 
In |[T3l, Samuelson considered a two-state Markov chain, 
and he pointed out that it may generate unbiased random bits 



by applying the von Neumann scheme to the exit sequence of 
state si. Later, in |3|, in order to increase the efficiency, Elias 
has suggested a scheme that uses all the symbols produced 
by a Markov chain. His main idea was to create the final 
output sequence by concatenating the output sequences that 
correspond to tti{X), tt2{X), .... However, neither Samuelson 
nor Elias proved that their methods produce random output 
sequences that are independent and unbiased, in fact, their 
proposed methods are not correct for some cases. To demon- 
strate it we consider: (1) 5'(7ri(X)) as the final output. (2) 
^(7ri(X)) * 'i>{TT2{X)) * ... as the final output. For example, 
consider the two-state Markov chain in which P[s2|si] = Pi 
and P[si|s2] — P2, as shown in Fig. [T] 



P\ 



Fig. 1. An example of Markov chain with two states. 

Assume that an input sequence of length = 4 is generated 
from this Markov chain and the starting state is si, then 
the probabilities of the possible input sequences and their 
corresponding output sequences are given in Table U In the 
table we can see that the probabilities to produce or 1 are 
different for some pi and p2 in both methods, presented in 
columns 3 and 4, respectively. 

The problem of generating random bits from an arbitrary 
Markov chain is challenging, as Blum said in fT): "Elias's 
algorithm is excellent, but certain difficulties arise in trying 
to use it (or the original von Neumann scheme) to generate 
random bits in expected linear time from a Markov chain". It 
seems that the exit sequence of a state is independent since 
each exit of the state will not affect the other exits. However, 
this is not always true when the length of the input sequence 
is given, say N. Let's still consider the example of a two- 
state Markov chain in Fig. [T] Assume the starting state of 
this Markov chain is si, if 1 — pi > 0, then with non-zero 
probability we have 

Tri{X) = S1S1...S1 
whose length is — 1. But it is impossible to have 

Tri{X) = S2S2---S2 
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of length — 1. That means Tri{X) is not an independent 
sequence. The main reason is that ahhough each exit of a 
state will not affect the other exits, it will affect the length of 
the exit sequence. In fact, tti{X) is an independent sequence 
if the length of tti{X) is given, instead of giving the length 
of X. 

In this paper, we consider this problem from another per- 
spective. According to Lemma |3] we know that permutating 
the exit sequences does not change the probability of a 
sequence, however, the permuted sequence has to correspond 
to a trajectory in the Markov chain. The reason for this 
contingency is that in some cases the permuted sequence 
does not correspond to a trajectory: Consider the following 
example, 

X = S1S4S2S1S3S2S3S1S1S2S3S4S1 

and 

■7t{X) = [S4S3S1S2, S1S3S3, S2S1S4, S2S1] 

If we permutate the last exit sequence S2S1 to S1S2, we cannot 
get a new sequence such that its starting state is si and its exit 
sequences are 

[S4S3S1S2, S1S3S3, S2S1S4, S1S2] 

This can be verified by attempting to construct the sequence 
using Blum's method (which is given in the proof of Lemma 
|2]i. Notice that if we permutate the first exit sequence S4S3S1S2 
into S1S2S3S4, we can find such a new sequence, which is 

Y = S1S1S2S1S3S2S3S1S4S2S3S4S1 

This observation motivated us to study the characterization of 
exit sequences that are feasible in Markov chains (or finite 
state machines). 

Definition 1 (Feasibility). Given a Markov chain, a starting 
state Sa and a collection of sequences A — [Ai, A2, A„], 
we say that (sq,,A) is feasible if and only if there exists a 
sequence X that corresponds to a trajectory in the Markov 
chain such that xi — Sa and 7r(X) — A. 

Based on the definition of feasibiUty, we present the main 
technical lemma of the paper Repeating the notation from the 
beginning of the paper, we say that a sequence y is a tail-fixed 
permutation of X, denoted as F = X, if and only if (1) Y 
is a permutation of X, and (2) X and Y have the same last 
element, namely, y\Y\ ~ ^\x\- 

Lemma 4 (Main Lemma: Feasibility and equivalence of exit 
sequences). Given a starting state Sa and two collections of 
sequences A = [Ai, A2, A„] and F — [Fi, F2, F„] such 
that Ai = Fi (tail-fixed permutation) for all 1 < i < n. Then 
(sq, A) is feasible if and only if (sQ,,r) is feasible. 

The proof of this main lemma will be given in the Ap- 
pendix. According to the main lemma, we have the following 
equivalent statement. 

Lemma 5 (Feasible permutations of exit sequences). Given 
an input sequence X — xiX2...xn with xjq — that 
produced from a Markov chain. Assume that [Ai, A2, A„] 



is an aribitrary collection of exit sequences that corresponds 
to the exit sequences of X as follows: 

1) Ai is a permutation (=) of nilX), for i = X- 

2) Ai is a tail-fixed permutation of 'iTi{X), for i ^ X- 

Then there exists a feasible sequence X' — x'^x'2...x'j^ such 
that x'l — xi and vr(A"') = [Ai, A2, A„]. For this X', we 
have x'j^ ~ x^. 

One might reason that Lemma |5] is stronger than the main 
lemma (Lemma |4]i. However, we will show that these two 
lemmas are equivalent. It is obvious that if the statement in 
Lemma |5] is true, then the main lemma is also true. Now we 
show that if the main lemma is true then the statement in 
Lemma |5] is also true. 

Proof: Given X = xiX2-.-xn, let's add one more symbol 
s„+i to the end of X (s„+i is different from all the states in 
X), then we can get a new sequence a;iX2...a;Ars„+i, whose 
exit sequences are 

[^i(X), ^2(^), vr jX)s„+i, 7r„(X), 0] 

According to the main lemma, we know that there exists 
another sequence x'iX2.-.x'pfx'j^^i such that its exit sequences 
are 

[Al, A2, ...,A^Sn+l, ■•■A„, <j)] 

and x[ — xi. Definitely, the last symbol of this sequence is 
Sn+i, i.e., x'pf+i — Sn+i- As a result, we have x'j^ — s^. 

Now, by removing the last element from x[x2-..x'j^x'j^_^_i, 
we can get a new sequence x = x'^x^.-.x'j^ such that its exit 
sequences are 

[Ai,A2,...,Ax,...A„] 

and x'l — xi. We also have x'j^ = s^. 

This completes the proof. ■ 
We demonstrate the result above by considering the example 

at the beginning of this section. Let 

X — S1S4S2S1S3S2S3S1S1S2S3S4S1 
with X = 1 and its exit sequences is given by 

[S4S3S1S2, S1S3S3, S2S1S4, S2S1] 

After permutating all the exit sequences (for i 7^ 1, we keep 
the last element of the i*'' sequence fixed), we get a new group 
of exit sequences 

[S1S2S3S4, S3S1S3, S1S2S4, S2S1] 

Based on these new exit sequences, we can generate a new 
input sequence 

X' — S1S1S2S3S1S3S2S1S4S2S3S4S1 

This accords with the statements above. 
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IV. Algorithm A : Modification of Elias's 
Suggestion 

In the section above, we see that Elias suggested to paste 
the outputs of different exit sequences together, as the final 
output, but the simple direct concatenation cannot always 
work. By modifying the method of pasting these outputs, we 
get Algorithm A to generate unbiased random bits from any 
Markov chains. 

Algorithm A 

Input: A sequence X ~ xiX2---Xn produced by a Markov 
chain, where Xi ^ S — {si, S2, Sn}- 
Output: A sequence of O's and I's. 
Main Function: 
Suppose xjy — Sy^. 
for i := 1 to n do 
it i = X then 

Output *(7rj(X)). 
else 

Output ^'(7r,(X)l^'(^)l-i) 
end if 
end for 

Comment: (1) 'I'(^) can be any scheme that generates 
random bits from biased coins. For example, we can use 
the Elias function. (2) When z = we can also output 
^(7ri(X)l'^'('^)l~^) for simplicity, but the efficiency may be 
reduced a little. 

The only difference between Algorithm A and direct con- 
catenation is that: Algorithm A ignores the last symbols of 
some exit sequences. Let's go back to the example of a two- 
state Markov chain with P[s2|si] = Pi and -P[si|s2] — P2 in 
Fig. [T] which demonstrates that direct concatenation does not 
always work well. Here, we still assume that an input sequence 
with length = 4 is generated from this Markov chain and 
the starting state is si, then the probability of each possible 
input sequence and its corresponding output sequence (based 
on Algorithm A) are given by: 



Input sequence 


Probability 


Output sequence 


SiSiSiSi 






S1S1S1S2 


(l-Pi)V 




S1S1S2S1 


(1 ~Pl)piP2 





S1S1S2S2 


(1 -pi)pi(l -P2) 




S1S2S1S1 


PlP2{l -Pl) 


1 


S1S2S1S2 


P1P2 




S1S2S2S1 


Pl(l -P2)P2 




S1S2S2S2 


Pl{l-P2f 





We can see that when the input sequence length TV = 4, a 
bit and a bit 1 have the same probability to be generated 
and no longer sequences are generated. In this case, the output 
sequence is independent and unbiased. 

In order to prove that all the sequences generated by 
Algorithm A are independent and unbiased, we need to show 
that for any sequences Y and Y' of the same length, they have 



the same probability to be generated. 

Theorem 6 (Algorithm A). Let the sequence generated by a 
Markov chain be used as input to Algorithm A, then the output 
of Algorithm A is an independent unbiased sequence. 

Proof: Let's first divide all the possible sequences in 
{si,S2, ...,s„}^ into groups, and use G to denote the set of 
the groups. Two sequences X and X' are in the same group 
if and only if 

1) x'l ^ xi and x'^ = xm ^ for some x- 

2) Ifi^x, ^r{X') = ^,{X). 

3) Ifi^x, ^:^{X') ^ n,{X). 

We will show that for each group S E G, the number 
of sequences to generate Y equals to that of sequences 
to generate Y' if Y and Y' have the same length, i.e., 
\S^By\ = \SC\By'\ if \Y\ = \Y'\, where By is the set 
of sequences of length N that yield Y . 

Now, given a group S,\f i = x let's define Si as the set of 
all the permutations of 'ni{X) for X ^ S, and if i 7^ % let's 
define Si as the set of all the permutations of 7ri(X)l'^»*^^'l~^ 
for X ^ S. According to Lemma [T] we know that for any 
y, Y' e {0, 1}', there are the same number of members in Si 
which generate Y and Y' . So we can use 15^(^)1 to denote 
the number of members in Si which generate a certain binary 
sequence with length I (e.g. Y). 

According to the definitions above, let li,l2, ■■■,ln be non- 
negative integers, then we have 

n 

;i+...+/„=|i'| *=i 

where each combination {li,l2, ■■■Jn) is a partition of the 
length of Y. 

Similarly, we also have 

n 

li + ...+l„ = \Y'\ i=l 

which tells us that ISTI^fI ISHBy] if \Y\ = \Y'\. 

Note that all the sequences in the same group 5* have the 
same probability to be generated. So when \Y\ — \Y'\, the 
probabiUty to generate Y is 

P[X e By] 



seG 



xes 



\Sf]BY\ 



SeG 



xes 



\S\ 



= e^[^]E^t|^ 

SeG xes ' ' 

= P[XeBY'] 

which implies that output sequence is independent and unbi- 
ased. ■ 

Theorem 7 (Efficiency). Let X be a sequence of length 
N generated by a Markov chain, which is used as input 
to Algorithm A. Let ^ in Algorithm A be Elias's function. 
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Suppose the length of its output sequence is M, then the 
limiting efficiency rj^ = as N oo realizes the upper 



elements in each exit sequence. For example, we can split 



bound 



HjX) 
N 



Proof: Here, the upper bound jy ' is provided by Elias 
||3|. We can use the same argument in Elias's paper |3| to 
prove this theorem. 

Let Xi denote the next state of s^. Obviously, Xi is a 
random variable for 1 < i < n, whose entropy is denoted 
as H{Xi). Let U = U2, . . . , u„) denote the stationary 
distribution of the Markov chain, then we have 



lini B^ = yu,H(X,) 



When N oo, there exists an eat which — > 0, such that 
with probability 1 — ejv, |7ri(X)| > [ui — eN)N for all 1 < 
i < n. Using Algorithm A, with probability 1 — ejv, the length 
M of the output sequence is bounded below by 

n 

^(l-e^)(|^,(X)|-l)77, 

where r/i is the efficiency of the ^I^ when the input is TTi{X) 
or 7ri(X)l^'('^'l^^. According to Theorem 2 in Elias's paper 
im, we know that as |7r.i(X)| oo, i]i — > H{Xi). So with 
probability 1 — e^v, the length M of the output sequence is 
below bounded by 

N 

^(1 - - tN)N - 1)(1 - tN)H{X,) 

i=l 



Then we have 



lim 

N-¥oo 



> 



lim 
lim 



E[M] 
N 

Eti(l - ^Nniu, - eN)N - l)HiX,)] 



H{X) 



N 



At the same time, 
can get 



E[M] 



N 



N 



is upper bounded by 



H{X) 
N 



. So we 



lim 



E[M] H{X) 
= lim ■ 



N N^oo N 

which completes the proof. ■ 
Given an input sequence, it is efficient to generate inde- 
pendent unbiased sequences using Algorithm A. However, it 
has some limitations: (1) The complete input sequence has to 
be stored. (2) For a long input sequence it is computationally 
intensive as it depends on the input length. (3) The method 
works for finite-length sequences and does not lend itself to 
stream processing. In order to address these limitations we 
propose two variants of Algorithm A. 

In the first variant of Algorithm A, instead of applying ^ 
directly to = TTi{X) for i = x (or A, = 7r,(X)l'^'(-^)l-i 
for i 7^ x), we first split into several segments with lengths 
kii,ki2,--- then apply to all of the segments separately. It 
can be proved that this variant of Algorithm A can generate 
independent unbiased sequences from an arbitrary Markov 
chain, as long as kii,ki2, ... do not depend on the order of 



Ai into two segments of lengths [^-^\ and f-'^], we can 
also split it into three segments of lengths {a, a, \ Ai\ — 2a) 
... Generally, the shorter each segment is, the faster we can 
obtain the final output. But at the same time, we may have to 
sacrifice a little information efficiency. 

The second variant of Algorithm A is based on the following 
idea: for a given sequence from a Markov chain, we can split 
it into some shorter sequences such that they are independent 
of each other, therefore we can apply Algorithm A to all of 
the sequences and then concatenate their output sequences 
together as the final one. In order to do this, given a sequence 
X — xiX2..., we can use xi = Sq. as a special state to it. 
For example, in practice, we can set a constant k, if there 
exists a minimal integer i such that Xi = Sa and i > k, then 
we can split X into two sequences xiX2-.-Xi and XiXi+i... 
(note that both of the sequences have the element xi). For the 
second sequence XiXi+i..., we can repeat the some procedure 
... Iterative ly, we can split a sequence X into several sequences 
such that they are independent of each other. These sequences, 
with the exception of the last one, start and end with Sa, and 
their lengths are usually slightly longer than k. 

V. Algorithm B : Generalization of Blum's 
Algorithm 

In im, Blum proposed a beautiful algorithm to generate 
an independent unbiased sequence of O's and I's from any 
Markov chain by extending von Neumann scheme. His algo- 
rithm can deal with infinitely long sequences and use only con- 
stant space and expected linear time. The only drawback of his 
algorithm is that its efficiency is still far from the information- 
theoretic upper bound, due to the limitation (compared to the 
Elias algorithm) of the von Neumann scheme. In this section, 
we generalize Blum's algorithm by replacing von Neumann 
scheme with Elias's. As a result, we get Algorithm B: It 
maintains some good properties of Blum's algorithm and its 
efficiency approaches the information-theoretic upper bound. 

Algorithm B 

Input: A sequence (or a stream) xiX2... produced by a 
Markov chain, where Xi G {si,S2, ...,s„}. 
Parameter: n positive integer functions (window size) Wi{k) 
with fc > 1 for 1 < i < n. 

Output: A sequence (or a stream) of O's and I's. 
Main Function: 

Ei = (j) (empty) for all 1 < i < n. 

ki = 1 for all 1 < i < n. 

c : the index of current state, namely, Sc = xi. 

while next input symbol is Sj null) do 

Ec = EcSj (Add Sj to Ec). 

if \Ej\ > vuj{kj) then 
Output ^(ii^j ). 



E, = 0. 

kj — kj 
end if 
c = j. 
end while 



1. 



8 



In the algorithm above, we apply function on Ej to 
generate random bits if and only if the window for Ej is 
completely filled and the Markov chain is currently at state 

Sj. 

For example, we set zui{k) — 4 for all 1 < i < n and the 
input sequence is 

X — S1S1S1S2S2S2S1S2S2 
After reading the last second (8*'*) symbol S2, we have 

El = S1S1S2S2 E2 = S2S2S1 

In this case, \Ei\ > 4 so the window for Ei is full, but we 
don't apply 'J to Ei because the current state of the Markov 
chain is S2, not si. 

By reading the last (9*'*) symbol S2, we get 

El = S1S1S2S2 E2 — S2S2S1S2 

Since the current state of the Markov chain is S2 and |i?2| > 4, 
we produce ^'(£'2 = S2S2S1S2) and reset E2 as cj). 

In the example above, treating X as input to Algorithm B, 
we can get the output sequence is 5'(s2S2SiS2)- The algorithm 
does not output S1S1S2S2) until the Markov chain 

reaches state si again. Timing is crucial! 

Note that Blum's algorithm is a special case of Algorithm 
B by setting the window size functions Wi(k) = 2 for all 
1 < i < n and k G {1,2,...}. Namely, Algorithm B is 
a generalization of Blum's algorithm, the key is that when 
we increase the windows sizes, we can apply more efficient 
schemes (compared to the von Neumann scheme) for ^f. 
Assume a sequence of symbols X — xiX2---Xn with xn = 
have been read by the algorithm above, we want to show 
that for any N, the output sequence is always independent 
and unbiased. Unfortunately, Blum's proof for the case of 
vJi{k) = 2 cannot be applied to our proposed scheme. 

For all i with 1 < i < n, we can write 

T^iiX) — FiiFi2...Fi,niEi 

where Fij with 1 < j < rrii are the segments used to generate 
outputs. For all we have 

and 

r 0< \E,\ <uj,{m, + l) if z = x 
[ < < ■cDi{mi + 1) otherwise 

See Fig. |2]for simple illustration. 







1 


Fu 


1 1 








1,^ 1 








1 F.- 


1 F,, 


1 



Fig. 2. The simplified expressions for tlie exit sequences of X. 

Theorem 8 (Algorithm B). Let the sequence generated by a 
Markov chain be used as input to Algorithm B, then Algorithm 



B generates an independent unbiased sequence of bits in 
expected linear time. 

Proof: In the following proof, we use the same idea as 
in the proof for Algorithm A. 

Let's first divide all the possible input sequences in 
{si, S2, Sn}^ into groups, and use G to denote the group 
set. Two sequences X and X' are in the same group if and 
only if 

1) xi = x'l and xn = x'j^. 

2) For all i with 1 < i <n, 

T^i{X) ~ FiiFi2...FijniEi 

TTi{X') = FliFl2---Fl^^E[ 

where Fij and F/^ are the segments used to generate 
outputs. 

3) For all F,, = F^^. 

4) For all i, E, = E[. 

We will show that in each group S € G, the number 
of sequences to generate Y equals to that of sequences to 
generate Y' if |r| = \Y'\, i.e., \SC\By\ = ISTI^Y'I for 
\Y\ = \Y'\, where By is the set of sequences of length N 
that yield Y. 

Now, given a group S, let's define Sij as the set of all the 
permutations of Fij for X ^ S. According to Lemma [H we 
know that for different Y G {0, 1}', there are the same number 
of members in Sij which generate Y. So we can use 1 15^ (0 1 
to denote the number of members in Sij which generate a 
certain binary sequence with length I. 

Let lii,li2,--,linnJ2i---,lnrn„ be non-ncgativc integers 
such that their sum is we want to prove that 

n mi 

The proof is by induction. Let w = X^ILi"^^- First, the 
conclusion holds for w — 1. Assume the conclusion holds 
for w > 1, we want to prove that the conclusion also holds 
for w + 1. 

Given an X £ S, assume Fim^ is the last segment that 
generates an output. According to our main lemma (Lemma 
lUl, we know that for any sequence in S, Fim^ is always the 
last segment that generates an output. Now, let's fix Fim. and 
assume Firm generates the last limi bits of Y. We want to 
know how many sequences in p| By have . as their last 
segments that generate outputs? In order to get the answer, we 
concatenate Firm with Ei as the new Ei. As a result, we have 
127=1 "^i ~ 1 — segments to generate the first |F| — km, bits 
of Y. Based on our assumption, the number of such sequences 
will be 

^ n mi 

where /n, ^(,+1)1..., ^„m„ are non-negative inte- 
gers. For a different kmi, there are \Simi{limi)\ choices for 
Firm ■ Therefore, | S f] By \ can be obtained by multiplying 
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Sirrii {hrrii ) | by the number above and summing them up over 
lirrii- Namely, we can get the conclusion above. 

According to this conclusion, we know that if |y| = \Y'\, 
then jS'PliJyl — \Sf]BY'\- Using the same argument as in 
Theorem |6] we complete the proof of the theorem. ■ 

Normally, the window size functions zui{k) for I < i < 
n can be any positive integer functions. Here, we fix these 
window size functions as a constant, namely, tu. By increasing 
the value of w, we can increase the efficiency of the scheme, 
but at the same time it may cost more storage space and need 
more waiting time. It is helpful to analyze the relationship 
between scheme efficiency and window size m. 

Theorem 9 (Efficiency). Let X be a sequence of length N 
generated by a Markov chain with transition matrix P, which 
is used as input to Algorithm B with constant window size vj. 
Then as the length of the sequence goes to infinity, the limiting 
efficiency of Algorithm B is 

n 

T]{ZU) = y^^Ui'l]i{zu) 

1=1 

where U = (ui, U2, u„) is the stationary distribution of 
this Markov chain, and rji {w) is the efficiency of when the 
input sequence of length vj is generated by a n-face coin with 
distribution {pii,Pi2, ■■■,Pin)- 

Proof: When N ^ oo, there exists an ejv which — J> 0, 
such that with probability 1 — e^r, {ui — e]y)N < \Tri{X)\ < 
[ui + ejv)-/V for all 1 < i < n. 

The efficiency of Algorithm B can be written as ri{m), 
which satisfies 

With probability 1 — e^v, we have 

N -V{ ) - ^ 

So when N oo, we have that 

n 

This completes the proof. ■ 
Let's define a{N) = ^ nk2'"-'', where ^ 2"'' is the standard 
binary expansion of N. Assume ^' is the Elias function, then 

Based on this formula, we can numerically study the relation- 
ship between the limiting efficiency and the window size (see 
Section IVIII i. In fact, when the window size becomes large, 
the limiting efficiency (n oo) approaches the information- 
theoretic upper bound. 



VI. Algorithm C : An Optimal Algorithm 

Both Algorithm A and Algorithm B are asymptotically 
optimal, but when the length of the input sequence is finite they 
may not be optimal. In this section, we try to construct an op- 
timal algorithm, called Algorithm C, such that its information- 
efficiency is maximized when the length of the input sequence 
is finite. Before presenting this algorithm, following the idea 
of Pae and Loui |10|, we first discuss the equivalent condition 
for a function / to generate random bits from an arbitrary 
Markov chain, and then present the sufficient condition for / 
to be optimal. 

Lemma 10 (Equivalent condition). Let K = {fc^j} be annxn 
non-negative integer matrix with Sj=i ^ij — N — I. We 

define S(^ct,K) '^■s 

S{a,K) ^ {X e {S1,S2, ...,S„}^|fcj(7rj(X)) = kij,Xi = Sa} 

where kj{X) is the number of Sj in X. A function f : 
{si, S2, Sn}^ {0,1}* can generate random bits from 
an arbitrary Markov chain, if and only if for any {a,K) and 
two binary sequences Y and Y' with \Y\ = \Y'\, 

\S(a,K)f^BY\ = \S[a,K}f^BY'\ 

where By = {X\X £ {si, sj, s„}^, /(X) = Y} is the 
set of sequences of length N that yield Y. 

Proof: If / can generate random bits from an arbitrary 
Markov chain, then P[f{X) = Y] = P[/(X) = Y'] for any 
two binary sequences Y and Y' of the same length. Here, we 
can write 

P[f{X) = Y]=Y, \S^^,K)C]BYmK)P{xi = ,s„) 

Where cj){K) = UtilTJ=iP'^;' and 0(/^)F(.ti = s,) is the 
probability to generate a sequence with starting state Sa and 
with exit sequences specified by K if such input sequence 
exists. Similarly, 

P[f{X) = Y']=Y, \Sia,K) n By mK)P{xi = S„) 

As a result, 

E i\Sia,K)f]BY'HS^c.,K)f]BYmK)P{xi^S^)^0 

(a,K) 

Since P{xi = s^) can be any value in [0,1], for all 1 < 
a < n we have 

Y.^\S(c.,k){~]By'\ - \S(^^K){~]BYmK) = 

K 

According to the linear independence of [Jk{4>{K)] in 
the vector space of functions on [0,1], we can conclude that 
|5(„,K)nSi'l = \Sio..K)(\BY'\ for all [a,K) if |F| = 

Inversely, if for all F, Y' with the same length, 
\Sic.,K)(\BY\ = \Si^^,K)(\BY'\ for all (a,if), then Y and 
Y' have the same probability to be generated. Therefore, / 
can generate random bits from an arbitrary Markov chain. ■ 

Let's define a{N) = ^ ^^2"'=, where ^ 2"'' is the standard 
binary expansion of N, then we have the sufficient condition 
for an optimal function . 
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Lemma 11 (Sufficient condition for an optimal function). Let 

f* be a function that generates random bits from an arbitrary 
Markov chain with unknown transition probabilities. If for 
any a and any n x n non-negative integer matrix K with 
^"^-^ X]j=i ^ij ^ N — 1, the following equation is satisfied, 

|r(X)Ha(|5(„,^)|) 

then f* generates independent unbiased random bits with 
optimal information efficiency. Note that \f*{X)\ is the length 

of f*{x) and \S(a,K)\ « the size of S^^^Ky 

Proof: Let h denote an arbitrary function that is able to 
generate random bits from any Markov chain. According to 
Lemma 2.9 in ifTOll . we know that 

Then the average output length of h is 

^ E «(l^(a,K)l)0WP[a:i =sj 

= ^ E E \r{xmK)p[xi^s^] 

So /* is the optimal one. This completes the proof ■ 
Here, we construct the following algorithm (Algorithm C) 
which satisfies all the conditions in Lemma [10] and Lemma 
nn As a result, it can generate unbiased random bits from an 
arbitrary Markov chain with optimal information efficiency. 

Algorithm C 

Input: A sequence X ~ xiX2.-.,xn produced by a Markov 
chain, where Xi E S ~ {si, S2, Sn}- 
Output: A sequence of O's and I's. 
Main Function: 

1) Get the matrix K = {fcy} with 

kij = kj{Tri{X)) 

2) Define S{X) as 

Six) = {X'\k,{7:,{X')) = h,yi,j;x[ = x^} 
then compute [^(X)!. 

3) Compute the rank r{X) of X in S{X) with respect to 
a given order. 

4) According to |>5'(X)| and r{X), determine the output 
sequence. Let J^k 2"*° standard binary expansion 
of |S'(X)| with ni > n2 > ... and assume the starting 
value of r{X) is 0. If r{X) < 2"^, the output is the 
ni digit binary representation of r{x). If 2"* < 
r{x) < X]fc=i2"'°, the output is the rii+i digit binary 
representation of r{x). 



Comment: The fast calculations of |S'(X)| and r{x) will be 
given in the rest of this section. 

In Algorithm A, when we use Elias's function as ^I*, the 
limiting efficiency rjN — (as N — > oo) realizes the bound 
' . Algorithm C is optimal, so it has the same or higher 
efficiency. Therefore, the limiting efficiency of Algorithm C 
as A'' cxD also realizes the bound ^^j^^ ■ 

In Algorithm C, for an input sequence X with xn = s^, 
we can rank it with respect to the lexicographic order of 0{X) 
and <t{X). Here, we define 

6{X) = (7ri(X)|^^(x)|, . . . ,7r„(X)|^^(x)|) 

which is the vector of the last symbols of ni{X) for 1 < i < n. 
And (j{X) is the complement of 0{X) in tt{X), namely, 

= (7ri(X)l- . . . , 7r„(X)l-mi-i) 

For example, when the input sequence is 

X = SiS4^S2SiS3S2S3SiSiS2S3SiSi 

Its exit sequences is 

■7t{X) = [S4S3S1S2, S1S3S3, S2S1S4, S2S1] 

Then for this input sequence X, we have that 

9{X) = [S2, S3, S4, Si] 
<7{X) = [S4S2S1, S1S3, S2S1, S2] 

Based on the lexicographic order defined above, both 
S'(X)| and r{X) can be obtained using a brute-force 
search. However, this approach in not computationally effi- 
cient. Here, we describe an efficient algorithm for computing 
|>S'(X)| and r{X), such that Algorithm C is computable in 
0(A^log^ iVloglog(A^)) time. This method is inspired by the 
algorithm for computing the Elias function that is described 
in m. 

Lemma 12. |S'(X)| in Algorithm C is computable in 
0{N{\ogN\og\ogNY) time. 

Proof The idea to compute |5'(X)| in Algorithm C is 
that we can divide S{X) into different classes, denoted by 
S{X, 9) for different 9 such that 

S{X,9) = {X'|V^,J■ fc,(7r,(X')) - h,,9{X') - 9} 

where fcy — kj{ni{X)) is the number of Sj's in Tri{X) for all 
1 < *5 J < n. 9{X) is the vector of the last symbols of tt{X) 
defined above. As a result, we have — J^e ["^(^i^)!- 

Although it is not easy to calculate |S'(X)| directly, but it is 
much easier to compute |5'(X, 6')| for a given 9. 

For a given 9 = {di,92, .■■,9n), we need first deter- 
mine whether S{X,9) is empty or not. In order to do this, 
we quickly construct a collection of exit sequences A = 
[Ai, A2, A„] by moving the first 9i in TTi{X) to the end 
for all 1 < i < n. According to the main lemma, we know 
that S{X,9) is empty if and only if TTi{X) does not include 
9i for some i or {xi, A) is not feasible. 
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If S{X,9) is not empty, then (a;i,A) is feasible. In this 
case, based on the main lemma, we have 



n 



(kii + ki2 + ... + kin - 1)! 



(n 



kiil...{ki0, - l)!...fci„! 

t— 1 ^ 



kii\ka\...ki.a\ 



«i2 



kj; 



where the first term, denoted by Z, is computable in 
0(A^(log A^loglog A^)^) time |i2J. Further more, we can get 
that 



k 



e 9 iJl ^^^^ + + •■• + kin) 

is also computable in 0(A^(log log log Af)^) time. 



) 



Lemma 13. r{X) in Algorithm C is computable in 
0{N log^ A^ log log A^) time. 

Proof: Based on some calculations in the lemma above, 
we can try to obtain r{X) when X is ranked with respect to 
the lexicographic order of 9{X) and <t{X). Let r{X,9{X)) 
denote the rank of X in S{X,6{X)), then we have that 

r{X)^ \s{x,e)\+r{x,e{x)) 

e<e{x) 

where < is based on the lexicographic order. In the formula, 
J29<e{x) l'5'(Ar, 6)\ can be efficiently obtained by computing 



e<e{x):\s(XM)\> 



n,;=i ^1 



Yl"=l{kil +ki2 + ... + hn) 

where Z is defined in the last lemma. So far, we only need to 
compute r(X, 9{X)), with respect to the lexicography order 
of cr{X). a{X) can be written as a group of sequences 
[ai{X),a2{X), ...,(7n{X)] such that for all 1 < i < n 

There are M — {N — 1) — n symbols in (t{X). Let ri{X) 
be the number of sequences X' e S{X, 0{X)) such that the 
first M — i symbols of (t{X') are the same with that of a{X) 
and the M ~ i + 1*'' symbol of <t{X') is smaller than that of 
cr{X), then we can get that 

M 

r{x,e{x)) ^Y.'-^ix) 

i=l 

Assume the M — i + 1*'* symbol in a{X) is the uf^ symbol 
in avi{X). Then we can get that 



n{x) 



E 



kw {Ti) 



IT, I! 



|T,| fci(rO!...fc„(T,)! 



n ^^-w 



where Tj is the subsequence of cr„. (AT) from the uf^ symbol 
to the end; Nj{X) is the number of permutations for aj{X). 
Let's define the values 



IT I 



P^ 



A' 



E 



kw (Ti ) 



where Wi is the index of the first symbol of T^, i.e., a^. [ui\ — 
s^,.. Then r{X,9{X)) can be written as 

M 

r(X,0(X))=^A^p°p°„i...p? 

i=l 

Suppose that logj M is an integer. Otherwise, we can 
add trivial terms to the formula above to make logA^ an 
integer In order to quickly calculate r{X, 9{X)), the following 
calculations are performed: 

s 1 s— 1 -IS \s — 1 I \s — 1 s — 1 

Pi - P2i-lP2i i\ - ^2i-\ + P2i 

s = l,2,...,logM;i = 1,2,...,2-^M 
Then applying the method in [12], we have that 

r{X, 9{X)) = Ai°^^ 

which is computable in 0(A/ log'^ Af log logM) time. 
As a result, for a fixed n, r{X) is computable in 
0{N log^ N log log A^) time. ■ 
Based on the discussion above, we know that Algorithm C 
is computable in 0(A^log^ A^loglog A^) time. 

VII. Numerical Results 

In this section, we describe numerical results related to the 
implementations of Algorithm A, Algorithm B, and Algorithm 
C. We use the Elias function for 4*. 

In the first experiment, we use the following randomly 
generated a transition matrix for a Markov chain with three 
states. 



P = 



0.300987 
0.462996 
0.42424 



0.468876 
0.480767 
0.032404 



0.230135 
0.056236 
0.543355 



Consider a sequence of length 12 that is generated by the 
Markov chain defined above and assume that si is the first 
state of this sequence. Namely, there are 3^^ — 177147 
possible input sequences. For each possible input sequence, we 
can compute its generating probability and the corresponding 
output sequences using our three algorithms. Table HIl presents 
the results of calculating the probabilities of all possible output 
sequences for the three algorithms. Note that the results show 
that indeed the outputs of the algorithms are independent 
unbiased sequences. Also, Algorithm C has the highest in- 
formation efficiency (it is optimal), and Algorithm A has a 
higher information efficiency than Algorithm B (with window 
size 4). 

In the second calculation, we want to test the influence of 
window size w (assume vji{k) — w for 1 < i < n) on the 
efficiency of Algorithm B. Since the efficiency depends on the 
transition matrix of the Markov chain we decided to evaluate 
of the efficiency related to the uniform transition matrix, 
namely all the entries are i, where n is the number of states. 
We assume that n is infinitely large. In this case, the stationary 
distribution of the Markov chain is -\. Fig.Rlshows 

that when vj — 2 (Blum's Algorithm), the limiting efficiencies 
for n — (2,3,5) are (i, i, |), respectively. When vo — 15, 
their corresponding efficiencies are (0.7228, 1.1342. 1.5827). 
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Output 


Probability 


ProbabiHty 


Probability 




Algorithm A 


Algorithm B 


Algorithm C 






with ro = 4 




A 


0.0224191 


0.1094849 


0.0208336 





0.0260692 


0.0215901 


0.0200917 


1 


0.0260692 


0.0215901 


0.0200917 


on 




101 l^i7S 


070^1 1 47 


10 


0.0298179 


0.1011625 


0.0206147 


01 


0.0298179 


0.1011625 


0.0206147 


11 


0.0298179 


0.1011625 


0.0206147 


000 


0.0244406 


0.0242258 


0.0171941 


inn 




07479SS 


01 71 Q41 


011111 


0.0018831 


1.39E-5 


0.0029596 


mill 


0.0018831 


1.39E-5 


0.0029596 


0000000 


1.305E-4 




6.056E-4 


1 nnnnnn 

i \j\fyj\f\j\f 






6 0S6F-4 


0111111 


1.305E-4 




6.056E-4 


1111111 


1.305E-4 




6.056E-4 


00000000 






1.44E-5 


10000000 






1.44E-5 


01111111 






1.44E-5 


11111111 






1.44E-5 


Expected Length 


3.829 


2.494 


4.355 



TABLE II 

The probability of each possible output sequence and the 
expected output length. 




5 10 15 20 25 30 

Fixed Window sizero 



Fig. 3. The limiting efficiency of Algorithm B varies with the value of 
window size ro for different state number n, where we assume that the 
transition probability pij = ^ for all 1 < *,i < n. 



So if the input sequence is long enough, by changing w from 
2 to 15, the efficiency can increase 189% for n = 2, 240% 
for n ~ 3 and 296% for n — 4. When zu is small, we 
can increase the efficiency of Algorithm B significantly by 
increasing the window size m. When zu becomes larger, the 
efficiency of Algorithm B will converge to the information- 
theoretical upper bound, namely, log2 n. Note that 3 is not 
a good value for the window size in the algorithm. That is 
because the Elias function is not very efficient when the length 
of the input sequence is 3. Let's consider a biased coin with 
two states si,S2- If the input sequence is sisisi or S2S2S2, 
the Elias function will generate nothing. For all other cases. 



it has only 2/3 chance to generate one bit and 1/3 chance to 
generate nothing. As a result, the efficiency is even worse than 
the efficiency when the length of the input sequence equals to 
2. 

VIII. Concluding Remarks 

We considered the classical problem of generating inde- 
pendent unbiased bits from an arbitrary Markov chain with 
unknown transition probabilities. Our main contribution is the 
first known algorithm that has expected linear time complexity 
and achieves the information-theoretic upper bound on effi- 
ciency. 

Our work is related to a number of interesting results in 
both computer science and information theory. In computer 
science, the attention has focused on extracting randomness 
from a general weak source (introduced by Zuckerman |17|). 
Hence, the concept of an extractor was introduced - it converts 
weak random sequences to 'random-looking' sequences, using 
an additional small number of truly random bits. During the 
past two decades, extractors and their applications have been 
studied extensively, see f9\ (14] for surveys on the topic. While 
our algorithms generate truly random bits (given a prefect 
Markov chain as a source) the goal of extractors is to generate 
'random-looking' sequences which are asymptotically close to 
random bits. 

In information theory, it was discovered that optimal source 
codes can be used as universal random bits generators from 
arbitrary stationary ergodic random sources |15| |5|. When 
the input sequence is generated from a stationary ergodic 
process and it is long enough one can obtain an output 
sequence that behaves like truly random bits in the sense of 
normalized divergence. However, in some cases, the definition 
of normalized divergence is not strong enough. For example, 
suppose y is a sequence of unbiased random bits in the sense 
of normalized divergence, and 1*F is Y with a 1 concatenated 
at the beginning. If the sequence Y is long enough the 
sequence 1 * F is a sequence of unbiased random bits in the 
sense of normalized divergence. However the sequence 1 *Y 
might not be useful in applications that are sensitive to the 
randomness of the first bit. 

Appendix 

In this appendix, we prove the main lemma. 
Lemma 4 (Main Lemma: Feasibility and equivalence of exit 
sequences). Given a starting state Sa and two collections of 
sequences A — [Ai, A2, A„] and T = [Fi, r„] such 

that Ai = Ti (tail-fixed permutation) for all I < i < n. Then 
(sq,A) is feasible if and only if {sa,T) is feasible. 

In the rest of the appendix we will prove the main lemma. 
To illustrate the claim in the lemma, we express Sa and A by a 
directed graph that has labels on the vertices and edges, we call 
this graph a sequence graph. For example, when — si and 
A = [S4S3S1S2, S1S3S3, S2S1S4, S2S1], we have the directed 
graph in Fig. |4l 

Let V denote the vertex set, then 

V = {so, Sl, S2, Sn] 
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and the edge set is 

E = {{s,,A,[k])}\J{{so,So,)} 

For each edge {si,Ai[k]), the label of this edge is k. For 
the edge (so,Sa), the label is 1. Namely, the label set of the 
outgoing edges of each state is {1,2,...}. 




Fig. 4. An example of a sequence graph G. 

Given the labeling of the directed graph as defined above, 
we say that it contains a complete walk if there is a path in 
the graph that visits all the edges, without visiting an edge 
twice, in the following way: (1) Start from sq. (2) At each 
vertex, we choose an unvisited edge with the minimal label to 
follow. Obviously, the labeling corresponding to (sq., A) is a 
complete walk if and only if (s^, A) is feasible. In this case, 
for short, we also say that (sa , A) is a complete walk. Before 
continuing to prove the main lemma, we first give Lemma [T4l 
and Lemma [15] 

Lemma 14. Assume (sq, A) with K— [Ai, A2, A^, A„] 
is a a complete walk, which ends at state s^- Then {sa,T) 
with r = [Ai, Ft^-, A„] is also a complete walk ending 
at s^, if A.^ = F;^ (permutation). 

Proof: (sq, A) and (sa,r) correspond to different label- 
ings on the same directed graph G, denoted by Li and L2. 
Since Li is a complete walk, it can travel all the edges in G 
one by one, denoted as 

where Si-^ = sq and Sj„ = s^. We call {1,2, N} as the 
indexes of the edges. 

Based on L2, let's have a walk on G starting from sq until 
there is no unvisited outgoing edges to select. In this walk, 
assume the following edges have been visited: 

where wi,W2, ■■■,W]\j are distinct indexes chosen from 
{l,2,...,iV} and Si^^_^ = sq. In order to prove that L2 is a 
complete walk, we need to show that (1) Sj^, = and (2) 

M = N. 

First, let's prove that s-j^ = s^. In G, let 7vf°"*^ denote 
the number of outgoing edges of Si and let N^^^"^ denote the 



number of incoming edges of Si, then we have that 

AT^") ^ 0, A^^°"*^ = 1 
N^'^^ = A^i°"*^ + 1 

Based on these relations, we know that once we have a walk 
starting from sq in G, this walk will finally end at state s^. 
That is because we can always get out of Si due to Nj^"^'' = 
iV^°"*) if *7^X,0. 

Now, we prove that M = N. This can be proved by 
contradiction. Assume M / N, then we define 

V = {wi,'W2, Wm} 

V = {1,2,...,N}/{wi,W2,...,wm} 

where V corresponds to the visited edges based on L2 and 
V corresponds to the unvisited edges based on L2. Let v = 
mm{V), then {si^ , Sj^) is the unvisited edge with the minimal 
index. Let I — iy, then {si^,Sj^) is an outgoing edge of s;. 
Here / ^ X' because all the outgoing edges of have been 
visited. Assume the number of visited incoming edges of s; 
is mI™^ and the number of visited outgoing edges of s; is 
A//™*\ then 

see Fig. |5]as an example. 




Fig. 5. An illustration of the incoming and outgoing edges of s;. In which, the 
solid an'ows indicate visited edges, and the dashed arrows indicate unvisited 
edges. 

Note that the labels of the outgoing edges of s; are the same 
for Li and L2, since I 7^ x, 0. Therefore, based on Li, before 
visiting edge (si„,Sj„), there must be A/;*-""*-* outgoing edges 
of si have been visited. As a result, based on ii, there must 
be Afj*-""*' + 1 = Afj''™'' + 1 incoming edges of s; have been 
visited before visiting {si^,Sj^). Among all these Af/™' + 1 
incoming edges, there exists at least one edge (si„ , Sj^ ) such 
that u e y, since only Af/™^ incoming edges of si have been 
visited based on L2- 

According to our assumption, both u,v ^ V and v is the 
minimal one, so m > v. On the other hand, we know that 
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(si^,Sj,J is visited before (si„,Sj„) based on Li, so u < v. 
Here, the contradiction happens. Therefore, AI = N. 

This completes the proof. ■ 
Here, let's give an example of the lemma above. We know 
that, when Sq — si,A [S4S3S1S2, sisssa, •S2S1S4, S2S1], 
(sq.,A) is feasible. The labeling on a directed graph corre- 
sponding to (sq, A) is given in Fig. HI which is a complete 
walk starting at state so and ending at state si. The path of 
the walk is 

S0S1S4S2S1S3S2S3S1S1S2S3S4S1 

By permutating the labels of the outgoing edges of si, we 
can have the graph as shown in Fig. |6] The new labeling on 
G is also a complete walk ending at state si, and its path is 

S0S1S1S2S1S3S2S3S1S4S2S3S4S1 




Fig. 6. The sequence graph G with new labels. 

Based on Lemma [141 we have the following result 

Lemma 15. Given a starting state and two collec- 
tions of sequences A — [Ai, A2, A^, A„] and T — 
[Ai, ...jFfc, A„] such thatVk = A^ (tail- fixed permutation). 
Then (sq, A) and {sa,T) have the same feasibility. 

Proof We prove that if (sq,A) is feasible, then (sqjF) 
is also feasible. If (sq, A) is feasible, there exists a sequence 
X such that Sq, — xi and A = 7r(X). Suppose its last element 

is xn = Sx- 

When fc = X, according to Lemma [T4l we know that (sq, F) 
is feasible. 

When fc 7^ X' we assume that A^ — iTk{X) — Xk^Xk^-'-Xk^. 
Let's consider the subsequence X = xiX2.-.Xk^-i of X. 
Then T^kiX) — Ajf^*"' ^ and the last element of X is Sk- 
According to Lemma [141 we can get that: there exists a 
sequence x[x'2.-.x'^. _i with x[ — xi and x'^, _i ~ Xk„-i 
such that 

7rK4...4_i) = [Mx),...,r^^'^-\7Tk+i(x),...,7Tjx)] 

smce F), = ^k ■ 

Let x'f. x'j, j^-^...x'j^ — Xk^Xk^+i-.-Xiq, i.e., concatenating 
Xk^Xk^^+i.-.XN to the end of x'ix'2...x'^ _i, we can generate 



a sequence x'^x^.-.x'^^ such that its exit sequence of state sj. 
is 

and its exit sequence of state Si with i ^ k is Ki ~ '!ii{X). 

So if (sq,A) is feasible, then (sq,F) is also feasible. 
Similarly, if (sq,F) is feasible, then (sq,A) is feasible. As 
a result, (sq,, A) and (sq,F) have the same feasibility. ■ 

According to the lemma above, we know that 
(sa, [Ai, A2, A„]) and (s^, [Fi, A2, A„]) have the same 
feasibility, (sq, [Fi, A2, A„]) and (sq, [Fi, F2, A„]) 
have the same feasibility, (sq, [Fi, F2, F„_i, A„]) and 
(sq,, [Fi, F2, F„__i, F„]) have the same feasibility, so the 
statement in the main lemma is true. 
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